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is RTRL (Real Time Recurrent Learning) 3] etc. for discrete time, [4] etc. for continuous time) In BPTT, it is 
necessary to make the error propagate to the past. This means that the past states of the neural network have to 
be stored. If the propagation is truncatedat T time step, that is called truncated BPTT(T) the neural network 
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Yoshizawa [7] that can be considered as a model for temporal pattern memory in animal motoric systems. 
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approximately as the network output after some time. We use tools from adaptive control theory to .... 

....to the theory of linear time varying systems where this condition is generically tue (under assumptions which 
are also needed in the time invariant case) However we cannot show that the linearized system related to the 
nonlinear neural network fulfils these generic assumptions. 1 Introduction In [7] the following model for 
learning of motions was proposed: Trajectories of motion are assumed to be stored in some parts of the 
motor nervous system. Whenever we try to memorize a new motion, e.g. riding a bicycle or swimming, we 
achieve our goal by conscious repetition in a way of supervised .... 
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. .. This algorithm has been independently derived in various forms by Robinson and Fallside (1987) Kuhn 
(1987) Bachrach (1988, chapter , this volume] Mozer (1989, chapter , this volume] and Williams and Zipser 
(1989a) and continuous time versions have been proposed by Gherrity (1989) Doya and Yoshizawa (1989), 
and Sato (1990a; 1990b) 5.1 The Algorithm For each k 2 U , i 2 U , j 2 U [ I, and 1 0 1 1 1 , we define p k ij (t) y k (t) 
w ij : 31) This quantity measures the sensitivity of the value of the output of the k th unit at time t to a small 
increase in the value of w ij , taking into .... 

....an approximation algorithm would provide an interesting blend of aspects of both truncated BPTT and 
subgrouped RTRL. 9 Teacher Forcing An interesting strategy that has appeared implicitly or explicitly in 
the work of a number of investigators studying supervised learning tasks for recurrent nets (Doya 
Yoshizawa, 1989; Jordan, 1986; Narendra Parthasarathy, 1990; Pineda, 1988; Rohwer Renals, 1989; 
Williams Zipser, 1989a; 1989b) is to replace, during training, the actual output y k (t) of a unit by the 
teacher signal d k (t) in subsequent computation of the behavior of the network, whenever such a 
target .... 
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. ..association model. One might think that this is due to the improper weight matrix and that 
synchronization will not be necessary if the pattern sequence is stored using a proper learning algorithm 
such as the recurrent back propagation (BP) algorithm (e.g. Pineda, 1987; Pearlmutter, 1989; Doya 
Yoshizawa, 1989). Actually however, as long as the network dynamics is time continuous, learning is not 
achieved unless the size of the network is small enough and very limited patterns are learned. The reason for this 
is that the BP learning only attempts to minimize the difference (mean square errors) between .... 
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and Zipser (1989a) and continuous time versions have been proposed by Gherrity (1989) and by Doya and 
Yoshizawa (1989). 5.1 The Algorithm For each k 2 U , i 2 U , j 2 U [ I, and 1 0 1 1 1 , we define p k ij (t) y k (t) w ij : 
28) This quantity measures the sensitivity of the value of the output of the k th unit at time t to a small increase in 
the value of w ij , taking into account the effect of .... 

....an approximation algorithm would provide an interesting blend of aspects of both truncated BPTT and 
subgrouped RTRL. 9 Teacher Forcing An interesting strategy that has appeared implicitly or explicitly in 
the work of a number of investigators studying supervised learning tasks for recurrent nets (Doya 
Yoshizawa, 1989; Jordan, 1986; Narendra Parthasarathy, 1988; Pineda, 1988; Rohwer Renals, 1989; 
Williams Zipser, 1989a; 1989b) is to replace, during training, the actual output y k (t) of a unit by the 
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results. This approach does not analyse the mechanism by which the periodic signal is generated nor does it 
make any attempt to characterise the set of parameter values for which the RNN has periodic solutions. 
Consequently, there is no guarantee that such a set of parameters exists .... 

....(approximately) as its output, the periodic teaching signal. As mentioned in the Introduction, it has been 
shown experimentally that a class of recurrent networks with configurations similar to the one 
considered here, are indeed able to learn and replicate certain types of periodic signals, see Doya et al. 
1989), Pearlmutter (1995) and Yang et al. 1994) We are interested in proving that such learning and 
replication has taken place. In the context of our learning replication process, there are two crucial aspects. We 
must prove that the Teaching Network produces periodic signals as its output and we .... 
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Robinson and Fallside, and Narendra and Parthesarathy (1990) have called it dynamic backpropagation. 
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....gain for some tasks is paid for by a loss of ability to deal with arbitrary, unknown causal delays between inputs 
and targets. In fact, state decay does not significantly improve experimental performance (see State Decay 
in Table 2) Of course we might try to teacher force (Jordan, 1986) (Doya and Shuji, 1989) the internal 
states s c by resetting them once a new training sequence starts. But this requires an external teacher that 
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....Conventional Learning Algorithm Two typical learning algorithms for the recurrent neural network are proposed. 
One is BPTT (Back Propagation Through Time) 1] etc. for discrete time, 2] etc. for continuous time) and the other 
is RTRL (Real Time Recurrent Learning) 3] etc. for discrete time, [4] etc. for continuous time) In BPTT, it is 
necessary to make the error propagate to the past. This means that the past states of the neural network have to 
be stored. If the propagation is truncated at T time step, that is called truncated BPTT(T) the neural network 
cannot memorize the signals .... 
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[Article contains additional citation context not shown here] 

Doya, K. and Yoshizawa, S. 1989. Adaptive neural oscillator using continuous-time backpropagation learning. 
Neural Networks 2, 375-386. 



Bifurcations of Recurrent Neural Networks in Gradient Descent.. - Kenji Doya (1993) (7 citations) Self-citation 
(Doya) ( Cor rect) 

....networks can model arbitrary dynamical systems [3] Back propagation learning schemes for multi layer feed 
forward networks have been successfully applied to a wide range of problems. In contrast, since gradient 
descent learning algorithms for recurrent networks became popular several years ago [19, 5, 18, 25], not 
many cases have been reported about their successful application to large scale problems. One reason for 
this is the large cost for gradient computation [26] However, another critical issue in training recurrent networks is 
bifurcation of the network dynamics. In general, asymptotic .... 

....that similar problems sudden increase of the error, explosion of the error gradient, and never converging 
learning process are encountered in training large scale recurrent networks on practical tasks. Actually in many 
simulations, we found that the error curves have several steep jumps [5]. Such increase in error is often 
considered as a numerical artifact, for example, that the trajectory in the parameter space is bouncing between 
steep cliffs of the error surface, which is frequently encountered in feedforward case [22] However, in the case of 
recurrent networks, the error .... 
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....network, a change in a weight can affect the future behavior of the entire network. Learning algorithms that 
take into account this recurrent effect have been obtained for both discrete time models (Rumelhart et al. 
1986; Williams and Zipser, 1989) and continuous time models (Pearlmutter, 1989; Doya and Yoshizawa, 
1989; Rowat and Selverston, 1991) The basic principle is to run a linearized version of the network 
dynamics and estimate the effect of a small change in a weight onto the error function. There are two ways 
for doing this sensitivity analysis; one is to run the linearized system forward in time and .... 

...have been studied. Here, we focus on the following model (Pineda, 1988; Pearlmutter, 1989) i y i (t) 
Gammay i (t) f 0 n m X j=1 w ij z j (t) 1 A ; i = 1; n) 9) z j (t) y j (t) j n u j Gamman j n : However, similar 
derivations apply to other models as well (Doya and Yoshizawa, 1989; Rowat and Selverston, 1991) We 
define an error integral E = Z T 0 n X i=1 i (t) 1 2 (y i (t) Gamma d i (t) 2 dt (10) and derive a gradient 
descent algorithm for minimizing E for a desired trajectory (d 1 (t) d n (t) 0 1 T ) with a given initial state (y 
1(0) y n(0) .... 
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(Correct) 

....elaborate architectures have been considered for modeling various classes of dynamical systems [7] Note that 
the state of such networks is updated sequentially from the input layer to the output layer. Another approach to 
modeling dynamical systems is the use of fully connected recurrent networks [2, 8, 9, 10]. In these models, 
each unit has either a discrete or continuous time delay and is updated in parallel. If such units are used to 
implement the nodes in the two layer recurrent network (as in Figure 1 (a) but now all nodes have delays) the 
performance of the network is not obvious. In a .... 

in (10) If we take z(t) j W 1 x(t) 14) as a new state variable, we have from (10) d dt z(t) W 1 d dt x(t) 0W 1 
x(t) W 1 W 2 S(W 1 x(t) V u(t) and therefore d dt z(t) 0z(t) W 1 W 2 S(z(t) V u(t) 15) This is another form of 
continuous time models considered in [4, 2]. 4 Discussion Using the universality theorem of two layer 
networks, we showed that any discrete or continuoustime dynamical system can be modeled by a fully connected 
discrete or continuous time recurrent network, respectively, provided the network consists of enough units. 
However, it does not .... 
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.... automatic parameter tuning algorithm for H H type neuron models [5] Since a H H type model is a 
network of sigmoid functions, multipliers, and leaky integrators (Figure 1) we can tune its parameters in a 
manner similar to the tuning of connection weights in continuous time neural network models [6, 12]. By 

training a model from many initial parameter points to match the experimental data, we can systematically 
estimate a region in the parameter space, instead of a single point. We first test if the parameters of a spiking 
neuron model can be identified from the membrane potential trajectories 

. ...potential trajectory. We first derive the gradient of E with respect to the model parameters ( i ; g j ; v a j ; 
s a j ; t a j ; In studies of recurrent neural networks, it has been shown that teacher forcing is very 
important in training autonomous oscillation patterns [4, 6, 12, 13]. In H H type models, teacher forcing 
drives the activation and inactivation variables by the target membrane potential v (t) instead of v(t) as follows, x = 
k x (v (t) Delta ( Gammax x1 (v (t) x = a j ; b j ) 6) We use (6) in place of (2) during training. The effect of a 
small.... 
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Bifurcations In The Learning Of Recurrent Neural Networks - Kenji Doya (1992) (8 citations) Self-citation 
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....0; i 6= k: Then the gradient of the average error is derived as follows. Ewkl = 1TZT0oXi=1 E(t) x i (t) x i 
(t) w kl dt = 1 T Z T 0 o X i=1 (y i (t) 0 d(t) g 0 (x i (t) p kl i (t)dt: 5) This computation method is called real 
time recurrent learning (RTRL) [3, 17]. Another method uses the adjoint equation of (4) dq i (t) dt = 1 i q i (t) 0 n 

X j=1 w ji j g 0 (x i (t) q j (t) 0 ffi i (t) 6) q i (T ) 0; i = 1 ; n) by using the output error as the input ffi i (t) E(t) x i (t) ae 
(y i (t) 0 d(t) g 0 (x i (t) i o; 

....train the network beyond the bifurcation boundaries. 4.1 Non recurrent learning algorithms One simple way to 
avoid the instability of the learning dynamics is to use a non recurrent learning rules. Feedforward 
approximations of recurrent dynamics were successfully used in sequence generation [3, 9] and 
sequence prediction tasks [2, 5] Those learning rules are derived from (4) by setting the weights w ij to 
zero except for those of the hidden to output connections (i = 1 ; o; j = o 1 ; n) They need only 0(on 2 ) 
computations, whereas RTRL requires 0(n 4 ) .. . 
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citation (Doya Yoshizawa) (Corre ct ) 

....negative and positive feedback pathways are found in those systems. Elucidation of the function of the 
sensory inputs to CPGs requires computational studies of neural and physical dynamical systems. Algorithms 
for the learning of rhythmic patterns in recurrent neural networks have been derived by Doya and 
Yoshizawa (1989), Pearlmutter (1989) and Williams and Zipser (1989) In this paper we propose a learning 
algorithm for synchronizing a neural oscillator to rhythmic input signals with a specific phase 
relationship. It is well known that a coupling between nonlinear oscillators can entrainment their frequencies. 

....curves represent y 1 i (t) and y 2 i (t) respectively. a:without coupling. b:1T = 0:0. c:1T= 1:0. c:1T = 2:0. d:1T = 
3:0. First, two CPGs were trained independently to oscillate with sinusoidal waveforms of period T 1 = 4:0 
and T 2 = 5:0 using continuous time back propagation learning (Doya and Yoshizawa, 1989). Each CPG 
was composed of two neurons (C = 2) with time constants = 1:0 and output functions g( tanh( Instead of following 
the two step procedure described in the previous section, the network dynamics (5) and the learning equations (3) 
and (6) were simulated concurrently with .... 

[Article contains additional citation context not shown here] 

Doya, K. & Yoshizawa, S. (1989) Adaptive neural oscillator using continuous-time back-propagation learning. 
Neural Networks, 2, 375-386. 

Online articles have much greater impact More about CiteSeer Add search form to your site Submit 
documents Feedback 

CiteSeer - citeseer.org - Terms of Service - Privacy Policy - Copyright © 1997-2002 NEC Research Institute 



http://216.239.57. 104/search?q=cache:NkxbSanl5MoJ:citeseer.nj.nec.com/context/73331/0.. 



12/27/03 



TC2100 



Page 3 of 3 



Request a Book/Journal Purchase 

Request a Book or Article 

Request a Foreign Patent Publication 

fe-submitl [Printable form] 
Request a Prior Art Search 

[e-submit] [Printable form] 

Fast & Focused Search Criteria 
STIC Online Catalog 
Translation Services 

Web Resources 

A Brief History of the Hard Disk Drive 

C» CiteSeer (Researchlndex) 

(Full text scientific research papers - in pdf and postscript formats.) 
Internet Engineering Task Force 

(The IETF Secretariat, run by The Corporation for National Research Initiatives with funding from the 

US government, maintains an index of Internet-Drafts.) 
Nanotechnoloqy 

Requests for Comments (RFCs) Database 

(Requests for Comments (RFC) document series is a set of technical and 
organizational notes about the Internet (originally the ARPANET), beginning in 1969 and 
discussing many aspects of computer networking, including protocols, procedures and 
concepts as well as meeting notes and opinions.) 

O Usenet Archive (Google Groups) 

C* Wayback Machine 



Intranet Home | Index | Resources | Contacts | Internet | Search | Firewall | Web Services 

Last Modified: 11/19/2003 15:07:41 



(Archived web pages.) 



Submit comments and suggestions to Anne Hendrickson 



To report technical problems, click here 



http://ptoweb/patents/stic/stic-tc2 1 00.htm 



12/5/03 



